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We propose an explicit recursive method to approximate a power-law with a finite sum of weighted 
exponentials. Applications to moving averages with long memory are discussed in relationship with 
stochastic volatility models. 



Exponential moving averages are widely used as tool 
for computing efficiently averages of time-changing quan- 
tities such as volatility and price. Their main advantage 
resides in their recursive definition that allows for easy 
numerical implementation, or remarkably simple models 
of stochastic volatility, such as GARCH |]|. Their use 
is however conceptually questionable when the process 
in question has long memory, as the volume and volatil- 
ity do IH Q. [3] . One should rather consider a power-law 
kernel; this requires however considerably more comput- 
ing power as one must keep track of all the data points. 
Some authors approximate a power-law with a sum of 
exponentials in the literature, the record being held by 
Ref. PI, which uses 600 exponentials for 2 decades but 
notices that only a few have a significant contribution to 
the final function. 

While the principle of economy should dictate to fit 
power-law-looking data with nothing else than a power- 
law (see for instance the controversy in the June 2001 
issue of Quantitative Finance), computing real-time av- 
erages with a power-law kernel is much eased by the use 
of a sum of exponentials. Recent stochastic volatility 
models for instance use a sum of exponentials 0, 0, 
(5, 12 and an infinity, respectively) with algebraically 
decreasing weights and algebraically increasing charac- 
teristic times, thereby respecting the long-memory of the 
volatility, which might explain in part their forecasting 
performance 0| . It is clear that only a handful of expo- 
nentials are required in order to approximate a power-law 
up to a given order of magnitude, as many practitioners 
are aware (see for instance Since financial market 

data time series do not extend over an infinite period, 
such approximation will be good enough for application 
to financial time-correlations. How many exponentials 
should be used and with what parameters seem never 
discussed in the literature. Here, we aim to derive an 
explicit and new simple scheme that improves the often 
used approximation; in addition we show that the usual 
assumption of independent contribution from each ex- 
ponential implies the existence of an optimal number of 
exponentials. 

Let f(x) = x~ a and g(x) = J2iLo9i( x ) where g t (x) = 



u>i exp(— \ix). Assume that one would like to approxi- 
mate / with g from x = 1 to x = 10 fe , that is, over k 
decades. The standard approach (see for instance [9j) 
consists in defining a cost function per decade that is the 
integral of some measure of the difference between / and 
g, i.e. 




and to minimize C with respect to Wi and Ai, so as 
to obtain 2(N + 1) coupled non-linear equations. Ad- 
hoc numerical methods have been investigated a long 
time ago, that solve the resulting set of equations by 
using the Gram-Schmidt orthonormalisation of exponen- 
tials 0. Our aim here is to obtain a sub-optimal (with 
respect to C) but explicit set of lOj and A^. 

The proposed method relics on a simple ansatz for wi 
and \. Instead of trying to solve an intricate set of 
non-linear equations, one observes that the nature of a 
power-law is to be scale-free, whereas an exponential has 
a well defined scale. Therefore, the role of each exponen- 
tial is to approximate a given region of the k decades. 
In particular, one wishes that the i-th exponential ap- 
proximates correctly f{x) at xi = (3 l where (3 > 1 is a 
constant. This already suggests that Ai oc j3~ l , which is 
both intuitive and well-known. Then one matches g to 
/ and its first derivative g' to /' at x% — f3 l . However, 
once again, this would yield 2(N + 1) coupled non-linear 
equations. The key observation is that, provided that 
is large enough (see below), only gi contributes signifi- 
cantly to g at Xi, i.e. g{xi) ~ gi(xi). We therefore solve 
9i(Xi) = }{xi) and g'fa) = f'(xi), which gives 

A, = afr 1 (2) 

-^(j)*- (3) 

However, g(xi) > f{x{) because the contribution of the 
exponentials other than the i-th cannot be totally ig- 
nored. Therefore, one must correct the above over- 
optimistic assumption by considering that g is a weighted 
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FIG. 1: Convergence of the approximation function g(x) to 
f(x) for the uniform ansatz with 2 (red line), 3 (green line) 
and 6 exponentials (blue line), and for the recursive ansatz 
with 6 exponentials (orange line); a — 2, f3 = 5 
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FIG. 2: Error per decade C as a function of N for various fe; 
a = 2; (3 = 10 fe//iv for the uniform and recursive ansatz (full 
and empty symbols respectively). Lines are for eye guidance 
only. 



sum of Qi(x) 

N 

g(x) = CiP~ la exp(a) exp(-a//? l x), (4) 

where {c{\ is a set of correction factors. The last step is 
to solve g((3 J ) = f{P 3 ), which is a set of N+l linear equa- 
tions with variables Cj. The complexity of the problem 
has been greatly reduced. One can solve numerically this 
set of equations. In order to obtain explicit expressions 
for Cj, one has to resort to another approximation. 

The simplest ansatz for ci already gives a high de- 
gree of accuracy and is equivalent to the one currently 
in use in the literature. Taking uniform a — c given by 
1/c = J2iLo P~ la exp(a) exp(— a/ (3 % ) ensures the equality 
g(l) = /(l). With this choice the factor exp(a) disap- 
pears from g{x) and 

N N 

g(x) = ( P~ la exp(-a//3 1 )) _1 £ f3~ ia exp(-a//fx) 

i=0 i=0 

(5) 

Fig. n snows now the approximation works for increas- 
ing N: each additional exponential extends the range 
that is well approximated by a factor f3. The value of f3 
was chosen large enough so as to emphasise the oscilla- 
tions of g(x) at each j3 3 . The uniform ansatz implies that 
while g(l) = /(l) = 1, g{ft) > f((3 j ) for < j < N since 
the contribution of each g^ is asymmetric with respect to 
(3 J ; when j = N, since there are no additional exponen- 
tials from i > j to contribute to g, g(f3 N ) < (3~ aN (see 
Fig. ^| . This problem is of course negligible when a very 
large number of exponentials is used; however, since our 
aim is to use as few exponentials as possible it needs to 
be addressed. 



The parameter (3 tunes how much of a decade is ap- 
proximated by a single exponential. When k and N are 
fixed, it is sensible to take (3 = 10 fe . The cost func- 
tion C is plotted in Fig. |3 as a function of N at fixed 
k for several values of k. For small N, C decreases ex- 
ponentially as a function N. Then, strikingly, C has a 
minimum at N m (k) and increases slightly before stabil- 
ising; the smaller a, the smaller the subsequent increase. 
One would have naively expected that C decreased mono- 
tonically as a function of N; however, since f3 decreases 
when N increases at fixed k, the assumption that the 
exponentials give independent contributions to g is not 
valid any more at N ~ N m , and becomes clearly in- 
correct when N > N m . The consequence is that g(x) 
becomes too large except at x = 1. This is not prob- 
lematic, however, since in practice, one prefers large (3 to 
small ones, so as to use as few exponentials as possible. 
As expected, N m increases linearly with k, implying that 
for a = 2, the optimal N = N m (k) ~ 1.7k, or equiva- 
lently (3 ~ 10 1 / 17 ~ 3.87. Another feature of this figure 
is that C(N m (k)) decreases as function of k: this due 
to the vanishing influence of the deviation caused by the 
downwards shift of the last exponential. 

It is possible to improve the precision of the approxi- 
mation for N < N m by modifying the scale of x, or equiv- 
alently by taking into account derivatives of g of higher 
orders. The second order yields A 2 ; = y/ a{a + l)(3~ l . 
From the conditions on the first derivatives and on the 
equality of functions, Wi cx (3~ al exp(yj a(a + 1)). This 
reasoning can be extended to match the derivatives up 
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FIG. 4: Zoom of Fig on the last two exponentials, a = 2; 

= 4 



FIG. 3: Error per decade C as a function of n for various 
N < N m = 5; k = 3, a = 2; /3 = 10 fc/Ar . Dotted lines are for 
eye guidance only. 



to order n, resulting in 

JV 

g(x) = (^/3- M exp(- A1 //3 l )) _1 E^ lQex P(-^/^) 

(6) 



i=0 



i=0 



with 



n-i 

J=0 



r(a + n) 



r(a) 



(7) 



Since /x does not depend on i it modifies the scale of x, 
which can be used to adjust the position in log-space of 
g relative to /. For large n, [i ~ (n + a — l)/e, therefore 
shifting g(x) to larger x. According to Fig. |3 as long as 
N < N m , there is an optimal n. This comes from the fact 
that g(P N ) < f((3 N ): it is more advantageous to shift x 
to larger values so as to avoid the too small value of g at 
(3 N . It also emphasises once again the need to solve the 
problem of the last exponential. 

The solution comes from a close examination of Fig. 
Q] the first exponentials do not contribute much to the 
value of g(f3 N ) for N not too small. This suggest that the 
contribution of gi{ft) to g(ft) can be neglected if i < j. 
As a consequence, g{(3 N ) — gN(P N ), and Cjv = 1. Thus 



cat_i =1-/3 



-a e a(l- 



1//9) 



More generally, 



k-l 



CjV-fe 



= 1 - Y2 c N~iP 



- a (k-t) e a(l-p- k ) 



(8) 



(9) 



Co is the same with both ansatze, since there is no ex- 
ponential on the left of j3°. Table|Ugives an example set of 



CAr_fc. It is noticable that c^-k display oscillations which 
are damped as k increases: since c/v = 1 is large in order 
to compensate for the absence of further exponentials, 
c/v_i must be smaller than co; next, cn-2 will be slightly 
larger than co so as to satisfy g{f3 N ~ r ) — /(/S^ -1 ), etc. 



TABLE I: Correction coefficients given by the recursive 
ansatz. a = 2, JV = 8, = 4 



k 





1 


2 


3 


4 


5 


6 


7 


8 


CjV-fc 


1.000 


0.720 


0.773 


0.763 


0.765 


0.765 


0.765 


0.765 


0.765 



The recursive ansatz always gives a better result that 
the uniform one, as it ensures that g{f} 1 ) is closer to f{(3 1 ) 
for all i, and particularly for large i; g approximates / re- 
markably well at Xi = f3 l provided that (3 is not too small. 
The differences are most perceptible for x ~ (3 N , where 
the recursive scheme gives a much better approximation 
(see Fig. 0J, which explains why it is most advantageous 
for k < 4 where it decreases C, at N m by a factor 2 for 
k = 2 and 1.5 for k = 3; larger k, hence larger N m , will 
not bring much improvement since the weight of the dis- 
crepancy caused by the uniform ansatz at j3 decreases. 
Improving the precision further is possible by taking more 
exponentials from the left hand side of (3 3 into account 
in the calculus of ci at the price of heavier and probably 
non-explicit computations. Finally, if solving the full set 
of linear equations for Ci does not give enough precision, 
the remaing possibility is to minimise numerically C 0] . 

The above approximation has an obvious application 
to financial markets. The measure of historical volatility 
is usually done with exponential moving averages 



V(t + St) = V(t)A + (1 - A)v(t) 



(10) 
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where v(t) is some measure of the instantaneous volatility 
(e.g. daily volatility) over dt units of time, and A = e~ A 
is the memory. RiskMetrics recommends A x = 0.98 or 
A 2 = 0.94. While this is an efficient way of computing 
an average, it implicitcly assumes a choice of a single 
time scale 1/| In A ~ 1/(1 — A) for A close to 1. Unfor- 
tunately, the volatility is a process with no obvious time 
scale, as its autocorrelation function decreases slowly: fit- 
ting it with a power-law gives an exponent v ~ 0.3 
In other words, any choice of A is a compromise between 
smoothness and reactivity. To our knowledge, the first 
paper to use a power-law kernel for measuring volatili- 
ties is from the Olsen group One possible reason of 
this particular functional form of the volatility memory 
is that the market is made of heteregeneous participants 
(ll|. For instance the variety of time-scales of people 
taking part into financial markets is obvious to any prac- 
tioner, hence a choice of a single A selects the categories 
of traders that the resulting average volatility incorpo- 
rates. Direct measure on high-frequency data revealed 
five characteristic time scales ||. Fitting a stochastic 
volatility model with five time-scales, this work found 
them to be 0.18, 1.4, 2.8, 7, 28 business days, with re- 
spective weights of 0.39, 0.20, 0.18, 0.12, 0.11; the time 
scales span about 2.2 decades, and the weights decreases 
algebraically as the timescale grows with an exponent of 
about a = 0.3. Other work considered a = 2 [| Ho| . 
Generally speaking, 2a — 2 = v : which gives a = 1.15 
if v = 0.3 (see e.g. |7J). For a — 1.15, five expo- 
nentials approximate best three decades with corrections 
c = (0.704,0.702,0.714,0.647,1). The average volatil- 
ity <t is a weighted sum of volatilities on given time 
scales corresponding to the A^s, which, in principle, still 
requires to keep the returns over a time horizon equal to 
the longest time scale; this is barely economical and de- 
feats the initial aim of the approximation. The solution 
is the use of sums of nested exponential moving averages 
of the last return that are a proxy for returns on larger 
time scales 0, . 



CONCLUSIONS 

We have provided a simple method to use efficiently a 
sum of weighted exponentials as a parsimonious approxi- 



mation of a power-law with any exponent. In particular, 
we have shown the existence of an optimal number of 
exponentials when one neglects the contribution of some 
exponentials in the determination of the coefficients. The 
recursive ansatz is probably precise enough for most ap- 
plications. 
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